Bandwidth-Aware Resource Management for Extreme Scale Systems

نویسندگان

  • Zhou Zhou
  • Xu Yang
  • Zhiling Lan
  • Paul Rich
  • Wei Tang
  • Vitali Morozov
  • Narayan Desai
چکیده

As systems scale towards exascale, many resources will become increasingly constrained. While some of these resources have historically been explicitly allocated, many, like network bandwidth, I/O bandwidth, or power, have not. As systems continue to evolve, we expect many such resources to become explicitly managed. This change will pose critical challenges to resource management and job scheduling. In this paper, we explore bandwidth-aware resource management for Blue Gene systems, where the partition-based interconnect architecture provides a unique opportunity to explicitly allocate bandwidth to jobs. In this paper we investigate the value of bandwidth awareness and further present a bandwidth-aware resource management design for Blue Gene systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Next Generation Resource Management at Extreme-Scales

With the exponential growth of distributed systems in both FLOPS and parallelism (number of cores/threads), scientific applications are growing more diverse with various workloads. These workloads include traditional large-scale high performance computing (HPC) MPI jobs, and HPC ensemble workloads that support the investigation of parameter sweeps using many small-scale coordinated jobs, as wel...

متن کامل

Jointly power and bandwidth allocation for a heterogeneous satellite network

Due to lack of resources such as transmission power and bandwidth in satellite systems, resource allocation problem is a very important challenge. Nowadays, new heterogeneous network includes one or more satellites besides terrestrial infrastructure, so that it is considered that each satellite has multi-beam to increase capacity. This type of structure is suitable for a new generation of commu...

متن کامل

Towards Measuring the Project Management Process During Large Scale Software System Implementation Phase

Project management is an important factor to accomplish the decision to implement large-scale software systems (LSS) in a successful manner. The effective project management comes into play to plan, coordinate and control such a complex project. Project management factor has been argued as one of the important Critical Success Factor (CSF), which need to be measured and monitored carefully duri...

متن کامل

A characterization of workflow management systems for extreme-scale applications

Automation of the execution of computational tasks is at the heart of improving scientific productivity. Over the last years, scientific workflows have been established as an important abstraction that captures data processing and computation of large and complex scientific applications. By allowing scientists to model and express entire data processing steps and their dependencies, workflow ma...

متن کامل

I/O-aware bandwidth allocation for petascale computing systems

In the Big Data era, the gap between the storage performance and an application’s I/O requirement is increasing. I/O congestion caused by concurrent storage accesses from multiple applications is inevitable and severely harms the performance. Conventional approaches either focus on optimizing an application’s access pattern individually or handle I/O requests on a low-level storage layer withou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014